[11150/30] 

METHOD AND DEVICE FOR OUTPUTTING INFORMATION 
AND/OR STATUS MESSAGES, USING SPEECH 

FIELD OF THE INVENTION 

The present invention relates to a method and a device for 
outputting information and/or status messages of at least one 
electrical device, using speech. 

BACKGROUND INFORMATION 

Methods and devices of this type are generally used in so- 
called interactive voice-communication systems or voice- 
controlled systems for, e.g. vehicles, computers, robots, 
machines, equipment, etc. 

In general, an interactive voice-communication system (SDS) 
can essentially be reduced to the following components: 

15 - Voic e Speech recognition system, which compares an orally 
input command ("voice command") to other allowed voice 
commands, and decides which command, in all probability, 
was orally input; 

Voice output, which outputs the voice commands and signal 
20 tones necessary for prompting the user, and possibly 

acknowledges the recognition result; 

Dialog and sequencing control, in order to explain to the 
user which type of input is expected, to check if the 
input is consistent with the prompt and the current 
25 status of the application, and to trigger the resulting 

action in the application (for example, the device to be 
controlled) ; 

Control interface as an interface to the application: 
Hidden behind it are hardware and software modules for 
3 0 controlling various actuators and computers, which 

contain the application; and 
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Application that is controlled by speech: For example, it 
can be an ordering or information system, a CAE 
workstation, or a wheel chair for the disabled. 



5 For example, such a voice -communication voice recognition 

system is described in German Published Patent Application No. 
195 33 541. To increase the acceptance of such man-machine 
dialog, synonymous words or various pronunciations for the 
commands are used, or the words are rearranged in the 

10 commands. For example, "larger radius when turning left" can 
alternatively be expressed here as "when turning left, larger 
radius". In addition, a multilingual, interactive 
communication system independent of the speaker can be set up 
by expanding the memory, it being possible to alternatively 

15 switch between the interactive communication systems of 

various languages. In addition, ellipses may be used, i.e., 
dispensing with the repetition of complete command sentences, 
and instead using commands such as "higher", "sharper", or 
"further", the voice -communication voice recognition system 

2 0 then assigning these to the preceding commands. In response to 
uncertain recognition, the voice -communication voice 
recognition system can also pose questions such as "Excuse 
me?", "Please repeat that", or "What else?", or issue specific 
suggestions such as "Louder, please". All of these measures 

2 5 are used to avoid monotonic communication and to have the 

dialog more closely approximate human- to -human communication. 
To improve the communication, the voice system is coupled to 
an optical display medium, on which the recognized commands 
are indicated for control purposes. Furthermore, the optical 

30 display medium allows the display of functions from the target 
device which are set in response to the voice command; and/or 
the display of various functions/alternatives, which can 
subsequently be set or selected by a voice command. A 
disadvantage of this device and the method implemented thereby 

35 is that, despite the given improvements, the voice output 
tires the user due to its monotony, so that his or her 
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reaction time is too slow during events requiring immediate 
action. An additional problem is that, in response to 
recognition difficulties, the voice -communication voice 
recognition systems perform an endless loop and issue the user 
5 the same prompt again and again, so that the workflow is 
interrupted . 

Therefore, it is an object of the present invention is based 
on the engineering to provide a method and a device for 
10 outputting information and/or status messages, using speech, 
in which the attentiveness of the user is improved. 

SUMMARY 

The above and other beneficial objects of the present 
15 invention are achieved by providing a device and method as 
described herein^ 

By using different intonations, the attention of the user is 
immediately obtained while the speech is being output, so that 
20 the reaction time for performing the requested instruction is 
considerably reduced. In the case of instructions requiring 
immediate action, the status messages have a command 
intonation . 

25 To further increase the attention span, and the 

differentiation of instructions requiring immediate action, 
the volume of the voice output may be increased for 
instructions requiring immediate action, and/or these 
instructions may be inserted in a particularly harsh or abrupt 

3 0 manner. 

In addition, the vo i c e - c ommun i c a t i on voice recognition system 
may be designed to be multilingual use multiple voices , so 
that, for example, one may choose between a man's voice and a 
3 5 woman's voice. One of these voices are selected by the system, 
for instructions requiring immediate action, and the other is 
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selected by the system, for information or status messages not 
requiring immediate action. 

To ensure the workflow, the voice -communication voice 
5 recognition system is only activated by actuating a "Push to 
talk" (PTT) switch, the dialog-communication level being 
changed in the absence of a valid interaction. To increase the 
recognition reliability and improve the user prompting, 
individual commands may be saved in various, alternative 

10 output forms, which are then successively output in response 
to an invalid interaction. The dialog-communication level is 
only changed when a valid interaction does v not ensue in 
response to all of the command forms. To avoid monotony, the 
sequence of the output may be permutated by a random- number 

15 generator. 

The basis of the present invention is to use the manner in 
which speech is output to the motor vehicle driver,- in order 
to create an emotion that causes one to act in accordance with 
20 the situation. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a flowchart illustrating a method for 
automatically controlling at least on one device using voice 
25 speech recognition according to the present invention. 

Figure 2a illustrates a graph of a potential danger during an 
interaction that does not require immediate action. 

3 0 Figure 2b illustrates a denotation graph corresponding to 
Figure 2a^ 

Figure 2c illustrates an intonation graph corresponding to 
Figure 2a_;_ 
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Figure 2d illustrates a connotation graph corresponding to 
Figure 2a_;_ 



Figure 3a illustrates a graph of a potential danger during an 
5 interaction that requires immediate action. 

Figure 3b illustrates a denotation graph corresponding to 
Figure 3a. 

10 Figure 3c illustrates an intonation graph corresponding to 
Figure 3a. 

Figure 3d illustrates a connotation graph corresponding to 
Figure 3a. 

15 

DETAILED DESCRIPTION 

The voice -communication voice recognition system is activated 
by actuating a PTT switch. For clarity, the voice output of 
the voice -communication voi c e recognition system is subdivided 

2 0 into commands KOM and prompts Auff which, in reality, may be 
identical. Hereinafter, commands KOM are to be understood as 
a direct instruction to act, such as "BRAKE" or "TURN ON 
LIGHT", whereas prompts Auff request an interaction in the 
form of an input, such as "Please specify desired temperature 

25 in degrees C . " 



If the voice -communication voice recognition system now 
generates a command KOM, then this command KOM is subdivided 
according to whether it is an instruction requiring immediate 

3 0 action or an instruction not requiring immediate action. More 
simply, instructions requiring immediate action are commands 
KOM, which call for the action to be performed quickly. An 
example of this is command KOM "Brake", when an ADR system or 
a precrash sensory system has detected a collision object. 

35 Examples of instructions not requiring immediate action 

include commands KOM of a navigation system. In this context, 
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instructions requiring immediate action are inserted in time 
t x , with command- intonation voice SI and high volume LI, in a 
harsh and abrupt manner, in order to produce a high degree of 
attentiveness in the user. However, instructions not requiring 
5 immediate action are inserted softly, at low volume L2 and 
normal intonation S2 . 

As a rule, time is not a critical factor in the case of 
prompts Auff, so that, in this case, good user prompting is of 

10 concern. For this purpose, n different alternatives of a 

prompt Auff may be stored in the speech memory. For example, 
the alternatives may be different emphases, pronunciations, 
word rearrangements, or synonymous terms. After acoustically 
outputting the first alternative, the voice -communication 

15 voice recognition system waits for a predetermined period of 
time for an interaction. If no interaction or an invalid 
interaction occurs within this time period, then the voice- 
communication voice rec o gnition system repeats the prompt, 
using the subsequent alternative up to the nth alternative, if 

20 necessary. If a valid interaction occurs, then this request is 
performed and, if necessary, a new prompt Auff is output. But 
if no valid interaction occurs in response to the nth 
alternative, then the system switches to another dialog- 
communication level DKE, in order to ensure the workflow. For 

2 5 example, new dialog- communication level DKE may then be a 
selection list, which is displayed on the trip-computer 
monitor, and from which the user may select a corresponding 
menu . 

30 Figures 2a-d schematically represent the conditions for an 
instruction not requiring immediate action, such as an 
information prompt for a navigation system. In Fig. 2a, the 
importance of the interaction is plotted over time. 
Instructions for action are output at times t 0 - t 2 , and it is 

35 assumed that there was no reaction to each preceding prompt. 

Since a missing input in the navigation system only results in 
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the inoperability of comfort components, which are also not 
necessarily desired by the motor vehicle driver, the 
importance does not change over time. The information 
regarding the content of the command, or the so-called 
5 denotation, i.e. the input request, also remains constant over 
time; as illustrated in Fig. 2b. At time t 0/ the motor vehicle 
driver may be prompted, "Please input your desired destination 
now." This prompt is issued, using a certain intonation I 1 and 
a certain connotation Kl , which are illustrated in Figs. 2c 

10 and 2d. If nothing is input, then the system does not know 

the reason for omission, e.g., if the motor vehicle driver did 
not hear the request or deliberately intended not to perform 
it. Therefore, the prompt, "Would you like to input a 
destination," is issued again at time t x , using a stronger 

15 intonation I 2 , in order to improve the possibility of it being 
perceived. However, connotation level K2 decreases. If, in 
response, nothing is input again, then the system may 
certainly determine that the motor vehicle driver does not 
wish to do this. To avoid annoying the motor vehicle driver 

20 with constant repetition, a prompt such as "If you do not wish 
to input a destination, I will now turn myself off" is then 
issued one last time, at time t 2 - This last prompt is output, 
using a very low intonation I 3 , and it just has a low 
connotation. As illustrated in Fig. 2d, the connotation forms 

25 an anticlimax, i.e., a transition from a strong to a weak 
expression, whereas a certain variation occurs in the 
intonation, in order to counteract monotony. 

In contrast, Figs. 3a to 3d illustrate represent a situation 
3 0 in which the importance of the interaction increases over 

time, until action is finally required. For example, the motor 
vehicle travels on a motorway at a speed greater than an 
allowed speed, while maintaining the safety distance behind a 
motor vehicle. At time t 0 , the system issues an action 
35 instruction to the motor vehicle driver, e.g., in the form of 
"Please adjust your speed." The action instruction has a low 
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intonation degree I 1 and a correspondingly low connotation 
level Kl since the motor vehicle driver is indeed acting 
illegally, but no immediate danger exists. In addition, it is 
now assumed that the motor vehicle driver does not adjust his 
5 or her speed, and that his or her distance has just barely 

fallen below the safety distance, at time t. In other words, 
the potential danger of the traffic situation increases, which 
is illustrated by the rising curve in Fig. 3a. 

10 Consequently, the system issues the motor vehicle driver an 
action instruction, e.g., in the form of "You must brake" or 
"Please brake", this action instruction having a higher 
intonation degree I 2 along with a correspondingly higher 
connotation level K2 . If the motor vehicle driver also does 

15 not react to this, then the potential danger of the traffic 
situation is increased further, which is illustrated by the 
additional rise in Fig. 3a. This means that a further failure 
of the motor vehicle driver to react could lead to an accident 
in a very short time. This instruction requiring immediate 

2 0 action can, for example, be output in the form of "Brake 
hard", using command intonation I 3 . In this case, the 
connotation levels illustrated in Fig. 3d represent a climax, 
i.e. the increase in the expression, from less important to 
more important. In addition, it should be noted that the 

25 changes illustrated in Figs. 2a to 2d and Figs. 3a to 3d are 
not according to scale, but are rather to be understood as 
qualitative information . 
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ABSTRACT 

In a method and device for outputting information and/or 
messages from at least one device using speech, the 
information and/or messages required for vocal output are 
provided in a voice memory, the information and/or messages 
are read by a processing device according to a demand, and the 
information and/or messages are output via acoustic output 
device. The information and/or messages are output with a 
varying intonation according to their relevance. 
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